Skip to content

Conversation

ballista01
Copy link
Member

This PR introduces status reporting for the EtcdCluster custom resource (v1alpha1) as proposed in issue #135. It allows users to observe the actual state of the managed etcd cluster directly from the EtcdCluster object. Original reconciliation logic is preserved.

Changes Implemented:

  1. API Definition (api/v1alpha1/etcdcluster_types.go):
    • Defined the EtcdClusterStatus struct.
    • Added the following fields to EtcdClusterStatus:
      • readyReplicas (int32)
      • members (int32)
      • phase (string)
      • Added conditions ([]metav1.Condition) and CurrentVersion (string) (TO-DO: Reporting logic)
    • Enabled the /status subresource via the +kubebuilder:subresource:status marker.
  2. Generated Code:
    • Updated api/v1alpha1/zz_generated.deepcopy.go using make generate.
    • Updated the CRD manifest config/crd/bases/operator.etcd.io_etcdclusters.yaml using make generate / controller-gen crd.
  3. Controller Logic (internal/controller/etcdcluster_controller.go):
    • Modified the Reconcile method to gather status information:
      • Reads status.ReadyReplicas from the managed StatefulSet.
      • Reads etcd cluster member count (status.members) from the healthCheck results.
      • Sets status.phase based on the observed state during reconciliation (e.g., Creating, Scaling, Running, Degraded, Failed).
      • Logics regarding to conditions and CurrentVersion are to be implemented.
    • Implemented a helper function updateStatusIfNeeded for clarity and consistency in patching the status.
    • Utilized a defer statement to call updateStatusIfNeeded before the Reconcile function returns, ensuring the latest calculated status is attempted to be patched via r.Status().Patch(ctx, ..., client.MergeFrom(oldStatus)).

How to Test:

  1. Build and deploy the operator image from this PR branch: make docker-build docker-push IMG=<your-image> and make deploy IMG=<your-image>.
  2. Create an EtcdCluster instance: kubectl apply -f config/samples/operator_v1alpha1_etcdcluster.yaml (ensure spec.size > 0 and spec.version is set).
  3. Monitor the status updates: kubectl get etcdcluster <name> -o yaml --watch.
  4. Verify that status.readyReplicas, status.members, and status.phase are populated and change according to the cluster's lifecycle.
  5. Check operator logs for status update messages.

Future Considerations:

  • Implement population of status.currentVersion.
  • Implement handling and reporting using status.conditions.
  • Refine the status.phase logic for more accurate representation of intermediate and error states.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ballista01
Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @ballista01. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

// attempt processing again later. This could have been caused by a
// temporary network failure, or any other transient reason.
logger.Error(err, "Failed to get StatefulSet. Requesting requeue")
etcdCluster.Status.Phase = "Failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: In a scenario where the StatefulSets are already created but the getStatefulSet failed, having the etcdCluster.Status.Phase as Failed might seem misleading, wdyt?
Maybe Unknown makes more sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Failed is misleading. However, Unknown is ambiguous to the user. How about Degraded? Anyway the user can find detailed info in the conditions field. The etcdCluster.Status.Phase is now a high level summary based on conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Degraded is better in this case.
I was wondering if we should also keep a separate condition Unhealthy to have distinction between Degraded(when Get/Create response fails) and Unhealthy(when HealthCheck fails)

Comment on lines 66 to 79
// ReadyReplicas is the number of pods targeted by this EtcdCluster with a Ready condition.
ReadyReplicas int32 `json:"readyReplicas,omitempty"`
// Members is the number of etcd members in the cluster reported by etcd API.
Members int32 `json:"members,omitempty"`
// CurrentVersion is the version of the etcd cluster.
CurrentVersion string `json:"currentVersion,omitempty"`
// Phase indicates the state of the EtcdCluster.
Phase string `json:"phase,omitempty"`
// Conditions represent the latest available observations of a replica set's state.
// +optional
// +patchMergeKey=type
// +patchStrategy=merge
// +listType=atomic
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to get consensus on the field list first. I think we need to record each member's status. cc @ivanvc @jmhbnz @jberkus @neolit123 @hakman @justinsb @ArkaSaha30

My draft thought:

- ReadyReplicas (similar to sts's ReadyReplicas)
- CurrentReplicas (similar to sts's CurrentReplicas)
- MemberCount (how many members in the etcd cluster)
- Members (a slice)
  - name: etcd-1
     healthy: true
     Alarm/Error: nil
     version: 3.5.21
     storageVersion: 3.5.0
     downgradeInfo: nil
     dbSize: 5000000
     dbSizeInUse: 1000000
  - name: etcd-2
     health: true
     Alarm/Error: nil
     version: 3.5.21
     storageVersion: 3.5.0
     downgradeInfo: nil
     dbSize: 5000000
     dbSizeInUse: 1000000
  - name: etcd-3
     health: true
     Alarm/Error: nil
     version: 3.5.21
     storageVersion: 3.5.0
     downgradeInfo: nil
     dbSize: 5000000
     dbSizeInUse: 1000000
- Conditions []metav1.Condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working on this direction last weekend. Will update a new draft soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the members slice idea. It provides more fine-grained info. Would love to work on that if we agree on it.

Copy link

@neolit123 neolit123 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't mind if we push for an api that covers a number of interesting fields for the initial versions of the operator, but i think we should probably draft a google doc with a proposal and for the next version just push out a very simple spec that has just the etcd version and maybe health: true/false. note: even the health status needs some discussion of what it entails.

@ballista01 are you willing to draft such a api doc where we can rally discussions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 Sure! I'll post the link in slack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the argument for the
dbSize: 5000000 dbSizeInUse: 1000000

... fields? Is there some way that cluster operators would use that info that requires it to be available via the kubernetes API, instead of just via etcdctl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbSize: 5000000 dbSizeInUse: 1000000

etcd-operator can only get such info from etcd cluster via etcd SDK API (similar to etcdctl). One possible use case is automatic compact & defragmentation. But as mentioned, we can get such info from etcd cluster directly. So In general, for any info that we can get from etcd cluster directly might not go into the Etcdcluster.Status.

As we discussed in the community meeting, one action item for now:

  • Investigate the best practice to populate the Status. For example, how K8s populates & use the Statefulset's Status. How other well-known operators populate & use the Status.

In the long-term, if we add any field/info into EtcdCluster.Status, we should have a clear goal first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm basically looking for use cases of "someone is going to use other Kubernetes components to automate this" which is a good justification for it being available via the Kubernetes API.

Copy link
Member Author

@ballista01 ballista01 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion regarding to the status field:

status:
  observedGeneration: 5                    # NEW: Top-level field indicating the latest .metadata.generation processed by the controller
  currentReplicas: 3                       # Unchanged: Matches the StatefulSet replica count
  currentVersion: 3.6.0-rc.3               # Unchanged: Represents the running etcd version of the cluster
  leaderId: 975eab669fe7b560               # Unchanged: Leader member ID for status observability
  memberCount: 3                           # Unchanged: Total number of etcd members
  readyReplicas: 3

  # lastDefragTime could be optionally added if controller manages defragmentation
  # lastDefragTime: "2025-07-20T12:00:00Z"   # OPTIONAL: Cluster-level defragmentation completion time; not included in this PR

  members:
    - id: 22f4b6fa7639917a
      name: test-cluster-2
      version: 3.6.0-rc.3
      isHealthy: true
      isLearner: false                     # NEW: Added to support learner promotion logic and troubleshooting
      # REMOVED: dbSize/dbSizeInUse → considered too dynamic; suggested to be exposed via /metrics
      # clientURL and peerURL omitted here for minimal diagnostic view, can be re-added under a feature gate

    - id: 52117ab448fed65d
      name: test-cluster-1
      version: 3.6.0-rc.3
      isHealthy: true
      isLearner: false

    - id: 4f8c9e671cbd2233
      name: test-cluster-0
      version: 3.6.0-rc.3
      isHealthy: true
      isLearner: false

  conditions:
    - type: Available
      status: "True"
      reason: ClusterReady
      message: Cluster is fully available and healthy
      observedGeneration: 5
      lastTransitionTime: "2025-07-02T18:52:06Z"
      # Unchanged: Matches K8s convention for status.conditions

    - type: Progressing
      status: "False"
      reason: ReconcileSuccess
      message: Cluster reconciled to desired state
      observedGeneration: 5
      lastTransitionTime: "2025-07-02T18:52:06Z"
      # Unchanged: Indicates that spec and observed state are aligned

    - type: Degraded
      status: "False"
      reason: ClusterReady
      message: Cluster is healthy
      observedGeneration: 5
      lastTransitionTime: "2025-07-02T18:50:57Z"
      # Unchanged: Used to signal partial failures if any member is unhealthy

  # Note: dbSize and dbSizeInUse fields are REMOVED in this revision.
  # Rationale: These are volatile metrics not suitable for status fields;
  # better exposed via Prometheus-compatible /metrics endpoint for observability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. However, opening a discussion so that we can hammer out final fields list. Issue #182

@jberkus
Copy link
Contributor

jberkus commented May 19, 2025

/ok-to-test

@ballista01
Copy link
Member Author

Hi I know this PR is getting quite large, so I wanted to share my plan upfront. My main goal here is to use this as a tracker PR to run the full CI pipeline and give everyone a complete picture of the proposed changes.

Once CI passes and we're happy with the overall direction, I'll split this up into smaller, sub-PRs that will be much easier to review and merge.

I also noticed that after the foundational work, the current monolithic Reconcile function makes it tricky to write unit tests for the new status logic.

So, my next major step will likely be to refactor the Reconcile loop into smaller testable helper methods. I figure this might be a common challenge, so if there are already any existing issues, design docs, or discussions about refactoring the reconciler, please let me know! I'd love to build on any existing work.

I may start some exploratory work on the refactoring, but the final implementation in the smaller PRs will absolutely follow whatever we decide on as a community.

@ballista01 ballista01 force-pushed the etcd-cluster-status branch 2 times, most recently from c3e3b33 to 1ea2d50 Compare September 29, 2025 09:56
@ballista01
Copy link
Member Author

ballista01 commented Sep 29, 2025

It seems that with the new status & condition reporting feature, the cyclomatic complexity of Reconcile is too high. I'll try to get #180 rebased and merged first to reduce the complexity of Reconcile.

…s integration, and etcd health improvements

Signed-off-by: Wenxue Zhao <[email protected]>
@ballista01
Copy link
Member Author

Still working on reducing the complexity of status & condition reporting code in internal/controller/etcdcluster_controller.go.

- add condition helpers for consistent updates
- reconcile controller to patch status and observedGeneration each loop

Signed-off-by: Wenxue Zhao <[email protected]>
@k8s-ci-robot
Copy link

@ballista01: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-operator-verify 9de5dcf link true /test pull-etcd-operator-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants